Start crawlerIn the previous section, we have created our Scrapy project, looking at this pile of papers, presumably a lot of people will be a face, how should we start this crawler?Now that we've created the Scrapy crawler with the cmd command,
This recommended combination is Xml.dom.minidom and XPath. Where Xml.dom.minidom is the standard library for Python, no installation is required. XPath is an open source project Py-dom-xpath by Google.Install Py-dom-xpath:
Download the
What is XML?
XML refers to Extensible Markup Language (extensible Markup Language).
Extensible Markup Language, a subset of standard generic markup languages, a markup language that is used to mark electronic files so that they are structured.
It
XmlPathlanguage (XPath) is a language used to process XML document segments. XSLT (ExtensibleStylesheetLanguageTransformations, extensible style sheet language conversion) uses XPath description expressions and address paths to control node
The syntax of XPath
XPath syntax-predicate
Create a Scrapy project
Scrapy Startproject Articlespider
Create Scrapy crawler
CD Articlespider
scrapy genspiderjobbole blog.jobbole.com
How to use
You can copy XPath directly in the CHROME->F12
Function
XPath (XML Path language) is a language that processes XML document segments. XSLT (extensible Stylesheet Language Transformations, Extensible Stylesheet Language Conversion) uses XPath description expressions and Address path control node
In the previous article, we introduced the installation and configuration of the Python crawler framework Scrapy and other basic information. in this article, we will take a look at how to use the Scrapy framework to easily and quickly capture the
1. Locate the Buy button
Here, I wrote//td[@class = ' Text-center ']/button[@class = ' ng-isolate-scope ']/span[text () = ' buy '), prompting for no element found, The reason is the class value of the button, and I changed it to class= ' btn
Scrapy is controlled by the Scrapy command-line tool, and its command-line tools provide a number of different commands for a variety of purposes, each with different parameters and options.
Some scrapy commands must be executed under the Scrapy
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.